Abstract
A central goal in the field of Computer Vision is image understanding. In general, appearance cues are used to detect components of interest and then spatial and hierarchical relations among these components are used to "describe" the image content at the semantic level of interest. Current deep models have reached a stage of evolution in which they are able to learn and transfer low leve…